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TITLE: NETVWORK BASED CLASSIFEOlMFORftRATIOMSYSTEWlS 
FIELD OF INVENTION 

This invention relates to network based dassifiad infonnation systems, to methods of 
5 automalicaBy building searchable databases of dassiHed infomiation derived from web pages 
posted on a networft. and. to web pages for use in such systems and methods. 

The mformafon systems and databases of most relevance to this invention are those whid^ 
indude classified product and service catalogues similar to the Yellow Pages telephone bodks 
10 contact mdexes sniilar to the White Pages telephone boolcs. and/or subject indexes similar to 
Library catalogues. Sud» information systems and databases typically indude sets of 
assooated dassrfication. cont2ct andtor geographic items of information. For convenience, 
dassificabon. contad and/or geographic inforniatton wiB be herelnatter called CC6-data. 

15 The networtcs with whidi this invention is concerned are the worldwide pubfic 
computer/communications netwoHt commonly known as the Internet and private networits - 
sometimes caDed intranets - whidi aOow common access to markup documents on computeiB 
Mnneded to the network. Markup documents are text files prepared using various markup 

on nTJJf J"* ^ "yf»^Te«t Markup Unguage (HTML) and Extensible Markup Language 
""^ ™Pf^«2itlons (or diateds) of the Standard Generalised Markup Language 
(SGML). The system of accessible Res on the Internet is caDed the Workl Wide Web (WWW) 
and the markup documents themselves ere commonly called \web pages". A web page is said 
to be 'posted- on a network when ft b stored on computer-readable media of e host network 
computer as a fde which is generaSy accessible to network users. A web page is transported 

25 from the host computer to a requestaq computer through intermediate network computers as 
a computer-faadabte signal embodied si a carrier wave. Though this invention is not Bmiled to 
Internet based information systems, these terms are used for convenience. 

BACKGROUiVlO TO THE INVEt^Os) 
30 It has been esfimated that there ana about 100 minion wab pages on the Internet and that the 
number e doubling every two years. Many of these pages indude infonnation concerning 
commerciaBy offered goods and sen&aa and often indude contact details. But the difficulty of 
locating sudi nfoimation is increasmg faster than the growth in the number of web pages. 

35 To assist network users locate web pages of interest, certain network service providers create 
indexes (or databases) of the contents of web pages posted (stored on computer readable 
media so as to be generally 8ccessS>le) on the neturorfc and provide 'search erigines' to use 
the indexes. These indexes ana often created automatically by the use of "web crawlers' which 
(i) interrr^ale computer after computer on the network to tecate successive web pages ar^d fii) 

40 ndex the words in eadi web page encountered against the network address (eg mtemet 
Protocol Address or IPA) arwlfiing system path or universal resource tocator (URL) at whidi 
the web page is accessible. Heternaftef the tarms URL and URI (Unifomj Resource Identifier) 
are taken to be identical n meaning and to ^n'lfy network addresses and filing system paths. 
^S?" ""^ of a fet of unique words with each word having an assodated list 

45 of URU of the web pages whersm tJ» word was found to occur during interrogation. The URL 
serves as a 'hyperlink' which. 9 eetecied by a usei/searehar. results in the assodated web 
page being automaticaDy trartsmified ton the computer where H is posted on the network to 
the userfeeanAer's computer where B nay be (fispteyed or othenvise process The sending 
and receiving of files in this way is greatly assisted by user interface programs called 'web 

50 bnwsers' (or more simply, -browsers! sudt as Netscape and Microsoft Internet Exptorer 
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The search for web pages of interesl using se^tth engines leaves much to be desired: 

• simple searches (those using a few keywofds in simpte combinations) often yield far too 
many web page references (URLs) to pemr^it them to be interrogated one-by-one. 

5 • complex searches (those using many keywords and/or complex Boolean expressions) 
requtre consklerable expertise to undertake. 

• even using optimum search crtera. many ^relevant web pages are refererKed because of 
kiconsistent use of tenninotogy by those who author the original web pages. 

• even using optimum search critena. many relevant pages are missed, again because of 
1 0 cnconststent use of lerminok)gy by wob page authors, and 

• because items of infonnafion tnduded in the body of vveb pages cannot be 'understood' or 
associated in useful ways by web aawlei^; that is recognised as. say. a sumarrie. a stfeat 
name, a geographic bcaBty. or lype of goods or sen/tces and. say. a surname strongly 
associated with a street name, a geographic bcaftty. or a type of goods or serwce. 

15 The result is that information provvded by search engines from databases which are 
aulomaticatty compiled using web crawlers is a very poor equivalent of the common Yellow 
Pages and White Pages directories whfch serve the telephone industry (though these 
directories are not, of course, automaticafiy compiled from web pages). 

20 In an attempt to improve the usefulness of automaticafiy compiled network databases, some 
search engine providers make use of infonmation contained URLs, such as the country coda 
and top level domab name codes such as 'com*, 'edu*. nef and 'org' which ts sometimes used 
to sonify the subject matter of web pages. It has been proposed to add mora content 
classifying codes to URLs (eg. *chem* to signify chemical subject matter) to allow speciaGssd 

25 databases - national commerctat. chemical, etc - to be generated. However, this proposal 
has serious dr3wt>acks: 

• URLs are Internet addresses and it is in principle undesirable to confuse the address 
function of a URL with that of representirtg a list of web page dassrHcations or contact 
details. 

30 • A URL is an inappropriate container of multiple web page classification codes and contact 
details t^ecause the length of the URL would cause it to become unwieldy as an lnterT>et 
address. 

• Induding in a URL classification codes drawn from a list of thousands of codes wouM 
compromise the mnemonic quaSy of Intem^ addresses such as Vww.yellowpages.com*. 

35 • There is substantial ovedap k\ the subject matter contained in web pages having the 
various top levd domain name codea. 

• There is no consensus on. or standard for, content dassrficatwn codes in URLs. 

Another proposal to add content dassification data to web pages has arisen from the wish to 
40 identify pages containing material that may be offensive to some viewers, or shouU not be 
accessed by minors. The Ptatform for Internet Content Selection (PICS) (see 
http-i^vww.w3.org/pub/VVWW/PK:S and other documents at www.w3.org) is a web page 
ratings standard anEar in principle to the ratings systems for motion picturas. This system 
aDo«« page authors to TntemaV self classify Ihdr pages through use of the '<meta.->* 
45 HTML elemanL AttemativeV. 'exSeiDalT PICS ratings of web pages may be obtained from 
rattfigs service providers accessed each tone a URL is sctected. In practice, the ratings sconce 
providers have adopted very Emited range of vmb page dassrfications. For example. Ararat 
Sofhvare's Corrunercial Rating System (see http:/A<wvw.arar3tcom.rating8/8rarat10.htmI) 
provides pjst 5 categories of %vab page content commerdal content tedinicaVcustomer 
50 support ordering infomiatton, downloading infcxmation and contact information. In other 
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e«mples. Cyberi'aUol (h«tp:/W.micr(,sys.com/pics/pics_n,si.htm) provides 16 categories 
ttie Recreat«nal Soflwa«» Advisory Cound (htlp://«ww ,3ac.org«aq.htmO pSvWes 4 

Vancouver Webpages Rating Senrice (http://vancouver^ebpagesxorTWWP1.o/;°^es ?? 
5 cateflonas. None of the categories provide classification of web pages by indusSy «^cL 

Rather tf>e categones are mtenderf to pmvent web browsers from displaying web oa«s 
onsu^ble forpart«,b, types of ««b bnr«Ber usen:. Such rating systems are ^( to 
in ^T^"^ "^"^ of Yellow or White pages Z databases lZ ^St^l 

'"^"T?"" contact^ fSEt 

me dafamay only be encoded in Ihe <meta...> element in the <hBad>7Ln StS 
document drastKalty fmiting the type end usefulness of Ihe data that can be encoded 

htip7Mwf.nBseanA.apple.comAncf.htmn. requires the anient of web pages to be 
and tf,e dassitartton data to be heW In a separate norvHTML data fBa wijj a MJME 

H-nWL encoded documents ts a technical and economic barrier to the adoption by search 
20 ^L'^^n? "l^r*^ "^"^ P^P*"" « '"^'^V unsuited to the almated 
,!^n ll ^ '^"^^ ^"^ "^l- pages (MIME type 

tortftimiO because data stored according to the MCF proposal is not stored in HTWL enco^ 

web pages. 

The -EJectronic Business CanJ-. vCart. (see "vCard The Electronic Business CanT Version 
^ vwsit Consoftnim Spedfication. Sept 18. igS6 or »tpV/ds.intemianetrintefneKl/afte/dfaft- 
letf^sa^mime^caid-Ol.tat) uses norvHTIML data file (MIME Content Types of Teat/plain* or 
frKB non-standard -textOC-A/CanO contaming contact infomiation equivatent to an ertended 
can be exchanged on a network using Simple MaH Transfer Protocol 
30 2,wr IriT? T^ il"" be assodated with a web page by use of a URL in the web page 
."^^f^ rttomudofl (eg <a href=T,ttp:/W.thing.comA/Canl.vcn»My 
vCard^a>). Version 2.1 vCanJ standaid data file fonnat (published 18 September 1996) 
pravKtes for Ihe mdtision of many Dems of contact infomiation. The vCanS spedfication 
recommerids tt«l. where possaxe. tfnre ^td be consistent mapping of vCanl property 

35 HTML <nput name» The tntoiSion b to fadGtate the transfer of vCaid data mtoweb 
page «put fomis by pasting fmm a e^pSnard orby draggins from other computer applicafions. 
inevcard proposal e unsuSed to fts automatad creation of YeBow or White pages Gke 

T-T*- ™ ^^"^ ^""^ ^ ''«»'^'"9 to "he VCard 

proposal IS not stoned m HTML encoded web pages. 

JSl??*" t "^f^ "ft««rtion in a«p8,^ documents (sud, as Meta Content fBes or 
v^s) has the dBadvantage thai there fe necessarily mud, duplication of data and 
coortinaton of modrfications bel««en (he separate documents end the wab pages This must 
45 to^t!^'»I^'^'*^"^^^P*8o using 

45 to ttetemiBie whether worth csffing up the ass^ to aBow 

^'^1 t *° ^ '^"'^ ""teKtual infomution would have to be 

"J^!.*'''^ " ~» ""^'^ •his functionality. 

Another disa*«ntage b that non4mflL documents such as vCanis contain no detaiJsTTo 

50 ftT , « » be dfepteyed. In the display of KTWIL documents the position 

50 font s.^e. colour of (he te«l and other elements of the document are of great importan^lJ,e 
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rostnction of address data in a vCand to untagged ofdinaOy organised fields is inflexible. For 
example, muttiple instances of extended parts of the address are not possible. Also 
components of names, addresses and telephone numbers and so forth are msuffkaenlV 
identified. 

5 

The Online Computer Librafy Center Inc (CX:lC, Dublin. Ohio. USA) pfoposal. known as the 
•Oubfin Core*, proposes to classifying scholarly web pages by subject (topic of the worK or 
keywords that describe the content of the work), titte, author, publisher, other agent, date, 
object type (genre of the objad such as home page, novel poem etc), fonn, identifier, source, 

10 language. retetionship and coverage (spatial and temporal) (see 
http'7/eww.cdcofg:5046Hftfeib^ and other documents at www.odaofg). This 

proposal does not include industry. senm:e, product or subject dassifications. It also, does not 
include contact detaSs. Names such as that of the author are not specified in suffident detaS to 
avoid ambtguitids such as which is tho author's first and last names. The proposal spedfias 

15 that the detaBs are encoded using the <meta:..> etement in the <head> of web pagea. The 
proposal is unsuited to the automated creation of Yellow or White pages I3ce databases from 
web pages because the proposal does not provide for dassification of web pages and does 
not provide adequate contact details. Further, the use of keywords for doscnljing the content 
of the worfc adds very littfa to the effectiveness of cndesirig of web pages sirv:e the web pages 

20 ara usually indexed on every word of their content and most often the key worxls would scmpty 
be a duplication of words already contained ki the document 

h has also been proposed to use the Dewey Oectmal System (see 
http://orcrsch.oclaorg:6109/evaLdc.html end htlp://orcfsch.ock:.org:6109/bintrD.htmO to rank 

25 electronic documents against a Dewey Decimal subject dassification. The proposal suggests 
automaticaRy ass^ncng Dewey Dedma! subject dassification codes to documents during 
automated indeidng and catabgubg but does not specify the exact nature of the assignment 
although R is impTted that the codes are stored separately from the documents. The proposal 
admits that such automated dassification is less safefactory than human dass'rfication. The 

30 proposal is unsuited to the automata creation of Yeitow or White pages Eke databases from 
web pages because the accuracy of dassStcation is ^adequate, does not provide for inclusion 
of industry, sonncc or product dassStefions and does not provide for indusion of contact 
details. DerivHig a sut^d dassifi^tion code from an analysis of every word and phrase m a 
web page is computab'onaDy expensive. 

35 

The HTM. 3,0 standanJ (see page 23 of the www.w3.org document 'drafl^etf-html^pecvS- 
OO.btT) provides •class" as an attribute of almost all HTML •<body>" elements. The "dass" 
attribute te mtended to be used with style sheets. Style sheets provide a means by which the 
display of HTML documents may be aOered to suit the needs of different dasses of browser 

40 users. For example. <dfv dass^'appandi^t^ could be used to define a division that acts as an 
appendijt, <h2 dass='secSion"> could be used to dcfhe a level 2 header that acts as a section 
header, although, of course, any string of characters could be defined for those purposes. The 
"dass* attribute, although rwver having been suggested for holding goods and serwces 
dassifications, is not siAed for such a use as it is, in any case, undesirable to confuse the style 

45 sheetfuncttonofthe^dass" attribute. 

The HTML 3.0 and earliar standards provided the HTML etemants •<persor\>' and •<addrBSS>* 
but do not specify the form of tha ooitlsnt or method of validating the content of those 
elements. A person's name may be tsfiSten as fkst name followed by test name or last name 
50 followed by first name. Sirnaarfy. diffB?OTt com^ntions eMst for wrding addresses. SimSar 
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ambiguities arise in Ihe fli defined format of me HTWL elemenU •<pefson>' and •<address>". 
As such they are of little use in the automatic compilation of searchable databases. 

The XML language (see: http:/AextuaIity.com/sgm^erbA/VD-xml.htmO was developed to extend 
5 HTWL so that software vendors can add new elemenls and new element attributes to HTML 
which are not spedfically defined in any HTML standa^l. The intention is to ensure that aQ new 
etements and attrfeutes could be parsed by aD XML parsers even if the new elements held no 
significance for any particular XML parser. However, fike HTML XML does not provide a 
standard for the representation of industry, sennce. produa or subject classification, contact or 
10 geographic locatbndetads within an web page. 

Of course, many useful databases of the Yeflow Pages or White Pages type are made 
avaBabte by service providers on networlcs, but they are not compOed automatically by using 
web crawlers to scan HTML web pages posted on a network. For example, 

15 http:/Mww.yelIowpages.oom.au and http://www.mcp.com provida cfassrfied advertisements of 
the Yeftow Pages type with Bnks to the web pages of paying advertisers or eubecribers. There 
are also directories of emaS addresses %vhtch approximate the White Pages directories, feting 
the names of Individuals and organisations and contact detaSs. (eg httpiNvww.bigboolccom 
and htlp://query1 .whowherB.com). However, these emaS dractories require fisters to manually 

20 add Iheb- directory entries and enqumer^s to be aware of and to find the directory enquiry web 
page. They cannot be automatically generated by scanning web pages using web crawlers 
stnoB there is no adequate mechanism to relate emaB addresses to the names of people and 
organisations and their other contact detaSs which may also exist in the same web page. 

25 OBJECTIVES OF THE INVENTION 

The general object of the invwition is to provide improved methods for automatically building 
searchable databases of classification, contact, and/or geographical infonmation by usir^g web 
crawlers to Interrogate web pages posted on a network. fFor convenience, this kiformation is 
coOedivety refened to as CC&data]. 

30 

Other non-essential objectives are to provide methods for including and/or dsplaying CCG- 
data within web pages accessed by browserB. for automatically extracting CC&data from web 
pages posted on a network and for using the same, and/or to provide methods for searching 
automatically compiled databases using such data 

35 

Arwtfier subsidiary objective of Ihe mention is to provide a new torn of web page which is 
b^ter suited to the automatic compilation (using web crawlers) of databases constructed by 
the automatic scanning of many such pages posted on a network. 

40 OinUNE OF THE INVENTION 

The invention is based upon the reafisalion that highly useful databases can be automatically 
buat by successively interrogating web pages posted on a network if one or more HTML 
encoded CCG phrases are induded in the web pages. A CCG phrase is one containing CCG- 
data 01 a fomn which is directty accessible and kJentifiabte. CCG phrases may also include one 
45 or more items which provide the web page author with control over how the CCG-data is 
app&ed to the datat>ase. 

Data dupTicatwn can be reduced if some of the CC&data in the coded CCG phrases can be 
displayed by browsers as weH as being used to update databases. Enors due to inexactly 
50 dupHcated data are also eBrnmaled. Accordingly, it is envisaged that CCG phrases may include 
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one or more items which provide the web page author with control over how the CCG-data b 
displayed by a browser. 

HTML (including version 2 and version 3) and XML are evolving appfications (sub-sets or 
5 dialects) of ISO Standard 8879 1986 known as Standard Generafised Markup Language 
(SGML). HTML in large part is a language used to describe how text (unstructured data) and 
graphks is to be fonnatted for display. The HTML language consists of a finite number of 
•elements' (for example; *<8R>- where -BR* is the eiemert name, also cafled the tag name) 
which may contain •attrfeutes* (for example; •<0L COMPACT>' where XOMPACF is an 

10 attribute nanied •COMPACr) and may contain valuas associated with atlrtoutes (for example: 
•<FOMT SIZE=^^1>* whera +1 is the attribute value of the attnlnrte named •SCE*). XML is a 
language used to dcscffce stnjdured data. The XML language is simOarty composed of 
elements, attrftiutes and values with a simaar syntax to HTML but unflce HTML tt>8 element 
names which may be used are not restricted and the meaning of the XML data may be 

15 interpreted b any convenient manner. White the XML language is mute about how data 
descrd>ed by XML is to be formatted for display, the data may be used by computer programs 
for any purpose including description of how XML coded data is displayed. However, due to its 
historic importance in connection wth web pages, the tenn "HTML" is herein used to refer to aU 
markup languages which are subsets or compJete seta of the SGML language. In particular. 

20 the term "HTML encoded CCX3 phrase" and the synonymous term •CCG phrase' are herein 
used to refer to CCJG-data encoded in a subset or complete set of the SGML language. 
Herein, a Veb page" is a document adapted to be or achiaOy accesstole through a network 
and encoded in a subset or complete set of the SGML language. 

25 For convenience. CCG items in HTML encoded CCG phrases, whether they are syntacticafly 
represented as elements or as attra>utBS. wiU be referred to hereinafler as CCG attributes. 

A CCG phrase includes at least one of the following identifiable types of CCG-data attributes: 

• industry, product, service, and/or eubjeddasisification 

30 t coniart categories, contact person(8) and/or oiganisalion(s) names, tittes or 
associations, contact detab including physcal and postal addresses, telephone and 
fax numbers. emaD and internet or network addresses or kxations. pubftc keys, and 

• geographic location details. 

35 A CCG phrase may also indude any of the foflowing identifiable types of CCG control 
attributes: 

• database control attiibutes to indicate which parts of the data are to be used to 
update databases, and 

• display control attributes to indkste how browsers are to display the data. 

40 

By virtue of cccurrkig in the same CCG phrase, a plurafity of CCG-data attributes are 
associated with each other. 

By virtue of their occurrence in the same CCG phrase. CCG^ata attnlnites are kiententifiBd as 
45 a set of associated attributes. However the degree of assodatx)n between attributes can be 
controlled t>y the inclusion in the phrase of database control attr&)utes. 

The start and end of CCG phrases shouM be idenfifiable to dearly distinguish these phrases 
from other data. To hlon^ the begbvting and end of a CCG phrase, at least one HTML 
50 element ^ould have a CCG spedfic HTML danent name or CCG spedfic attrtoute name or 



.5303I98A_I_> 



8 

CCG specific value. Each CCG attribute may consist wrth or without other incidentaf 
characlers, of a CCG attribute name and/or a CCG value or values. Preferably, each CCG 
phrase is contained in the •<body>' of the web page. 

5 Two examples of a CCG specific HTML element are: •<CCG ..>* or -<CCG />' or 
-<CCG>..-</CCG>'. (Where a CCG phrase is coded in XML. the etemenls •<XML>* and 
^</XML>- may also be needed at the start and end of the CCG phrase.) A less satisfactoiy 
example is: •<!-CCG ...-> where the characters ^G* after HTML comment etement name 
•!-• are used to signify that the comment contaro CCGnJata. An example of the use of a CCG 
1 0 specific attribute name is: •<START CCG>-...-<END CCG>-. An example of the use of a CCG 
specific value is: •<START TrPE=-CCG>-...-<END TYPE^CCG>'. Obviously, other 
character strings could be substftuted for the element name, element attnWe name or 
element attribute value "CCG" string of the examples. 

15 The codes -<CCG .„>' and '<CCG ... t>' are compatible with most HTML spedfications, but 
beirig non-standard HTML, most web browsers do not display any text or attributes (eg 
PQ=*AQDT within the anglo brackets and These codes are preferred where display of 
the CCG data b not required and compata)% with older browsers is required (eg CCG 
phrases containing only classification values). 

20 

From one aspect, therefore, the invention comprises a web page for posting on a networic, the 
web page being characterised by the inclusion of at least one CCG phrase in the •<body>* of 
the page, the CCG phrase being such that the CCG attributes contained therein are 
accessible and identifiabte by (0 HTML compTiant editors and/or (o) HTML compliant web 
25 aawlers for the automatic construction of databases of classified infonmation. and/or (m) HTML 
compliant browsers for display on the computer screens of networic users. 

From another aspect, the nvention comprises a method of constnjcting web pages of the 
above described type. The web pages may be constructed on digital computers using simple 

30 text editors such as Microsoft Windofws Notepad, or preferably, purpose built human controted 
editors or automated composing programs which embody toiowledge of HTML and CCG 
syntax and grammar. Which ever process is used. CCG attn*butes are selected and inserted, 
modified, deleted and/or organised to fbim a vaBd CCG phrases in HTML encoded documents 
and the documents are posted on computer readable storage devices of computers connected 

35 to a computer networic so that (he documents are generally available to computers on the 
networic. 

From another aspect, the invention comprises a method of populating a database with CCG- 
data extracted from web pages. Web pages posted on a networic are sucecssrvely r^riovod by 

40 a digital computer program (eg:, a web crawler) and CCG phrases contained therein are 
identified and at least some of the CCG attributes found within the CCG phrases are extracted. 
The CCG attribute names are used to determine the type of data in the assodatad values. 
Generafty the CCG attributes of htsTBst are those relating to classification, contact and 
geographic data and database update controls while the attrfcutes of fittle or no of hterest in 

45 relation to database updating are (hose relating to display controls. Of course, the CCG-data 
extracted need only be that rrfevant to the particular database being updated. For example, 
one database may have been designed to index onty web page classifications and URLs while 
another database may have been deagned to index only contact details. Databases also differ 
« their intemal representation of data and means of associating data. For example, some use 
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•flat file' tabtes. others use pointers lo data to create networtt associations while others use 
hashing and buckets. 

The conventional nomenclature diners considerably between different types of database. 
5 Depending on the particular databass nomenclature, data of the same type is said to be stored 
in table cotumns, ftelds. attributes and properties. The terms column ar\d ftetd are somewhat 
rented to the physical representatton of the data in files whte attribute and property ts more 
related to the logical representation of data. To avoid confusion, with the terms "HTML 
attr&ute'. "CCG attribute* or ]ust "attr&ute'*, hereinaftor a database property means both a tfpe 
10 of data stored in the databasa and a p3aoe in the database where data of the same type is 
stored. Database properties are r^envsd to by a name fproperty name*) or similar reference 
and contain values. For example, a database property with the name "City rtame* and which 
contains values which are aQ the names of ^es may be defined as a *C(ty n^me' type 
database property. 

15 

Whichevef style of database is used, tt is preferred that the database update program relate 
the CCG attributes to correspondmg database properties used by the database update 
process so that the database property values are updated w*tf) CCG values In a manner wtiich 
preserves the distinctness, content and meaning of the CCG values and, preferably, presences 
20 the CCG value associations e^essed in the CCG phrase as sets of associated database 
property values of different types. 

In some cases, it is desired to know the address of the web page from which the CCG values 
were esttracted. For example, the purpose of tHiild'ing a database m^ht be to allow searching 

25 of the database by web page dassffigrSon to provide a ftst URLs of web pages or URLs of 
portions of web pages which contain matching CCG classrftcattons. The URLs oould then be 
Inserted in an HTML document arvj transmitted to a web browser as a fist of references to web 
pages matching a search expression. In that e2camp{e. associating the URL of a web page or 
the URL of a portion of a web page with the CCG values extracted from the same web page or 

30 web page portk>n is important and the URL or means of reconstnjcting k must be avaHable and 
suppfied to the database update process. In one style of database, the values of the same 
type arsf held separate rows tn q coSumn (proparty) of a database table, and pointers held in 
another column (property) are assocs^itE^ wBh the values by sharirig the same table row. The 
table row constitutes a set of aesocsotaJ property values. Each pointer points to a budcet 

35 (blodc of data) containing a 6st of URLs or pointers to URLs held in a separate budcet or table. 
In another style of database, yfzhnse of titffi&rent types are held in different tables together with 
a set numt>er, pointer or sim2ar ootfe wh£ch & used to ktdicate which values are associated as 
memt>ers of the same seL tn orte variaGon. the vsh^es of set memt>ers are prefked with a code 
indicating the typa of value and afi values are held in the same column of a table. If the 

40 purpose of the database is to hold contact data, recording the web page URL in the database 
migtit not be required although ^ the URL m not present si the database, updating dianges in 
the CCG contact details contained t^Shin a web page is more difficult Of course, one 
database may be used to record aB types of CCG values contained in yjmb pages and 
associate with each other any and aS values extracted from the same web page or even from 

45 other web pages. 

From another aspect, the invention comprises a method of searching the databases 
constnictBd as out&ned above. These databases may be used for a variety of searching 
purposes. For example, to find web page URLs by using the assodation of web page URLs 
50 with ffidustry. service, product or sub^ dsssifkation or a person's or organisation s name or 
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address or geographic kxatfon values or any combination thereof, tn another erarnpte the 
databases may be used to find the contact details for people or organisations by name or 
location of industry, service. producJ or ««b page subject type and so forth by using the 
assoaation between items of the contact detaHs in the database without having to retrieve web 
5 pages associated with the contact detaas. 

More particularly, the searching method invotves finding URL raferencas or finding sets of 
assocMled database property values, from databases containing CCG^Jata The method 

,n 'Hif^. '^'^ ^ ''"^ P*^" » a«nputer network to estred query 

10 relational expressions and. from each eapression. deriving a query field name, query relafomd 
operator and query value, detemunlng the type of the queiy fieW by reference to its name 
reljing the query field to a correspondtng database property according to type and tocating* 
CCG^ata database property values in the database property whfch return a tiufe value wh^ 
1 K ^ * '^^ value using the query relatJonal operator. Finally, the URL references 

JlruS^em^^ assodated with the so located CCG^ata database property 

Database queries are usually expressed In a query language in the fom, of a phrase or 
sentence m query by example style enquiiy systems, the user types values into input fields on 

20 a form and a program extracts the input values and uses the values to automatically compose 
a query phrase or sentence. There ana many easting examples of query languages used in 
»nned>on with databases. GeneiaBy. thsy consist of relational expressions (eg Field=Value) 
logical expressions and grouping of letalional and logical expressions by means such as 
|»renth«»s. Thay may also contain euling and output formatting expressions. Often 

Z5 abbreviated notation is used in the expiossiana such as baving out field names or relational 
operators which are then inferred from the value h the expression or implied by default In an 
enquiry the nature and format of the outpul may also be implisd. such as a list of URLs of web 
pages or a Csl of contact details. Whatever Is the mechanfera of any particular database, the 
query expression needs to be par«sd and fields in the query axpressbn. eapfidt. default. 

30 nipOsd or infened. need be related lo database properties of similar type. In some styles of 
database enquiry the query expression is evaluated agahst each row of a table or record of a 
file to find rows or records fie a set o? associated property values) which match the query 
expression, in other styles, sub-sets of fiie values of the properties are selected acconling to 
the tfitaptBtation of relational expressions in the query expression and the sul«ets are 

03 combined^ according to logical and grouping expressions in the query to find the sets of 
assoaated property values which match the query expression. Often, to make togical 
op«abons which combine the BUeOsd 8ulH»ts more efficient, it is not the values which are 
selected but pointeiB to the vahies («g Tabb name and table row) or unique keys (eg URLs or 
pointers to URU) associated with the vabns. For esample. the AND iogkal operator is often 
combine two lists so that onV values or pointers or keys common to both lists are 
tound Di the combined Bst Usually, the query produces a result fist which 6 then provided to 
ottwprocesses. For example, a Bst of URLs of web pages is processed to produce an 
attradivBly formatted HTIML encoded document containing the URLs and Is sent to a web 

AH ^ '° Bitaresting web pages. In another example, the contact 

ff^ *" ^"^'^^^ wBh each value or pointer in the resuR fst are retrieved from 

the database and presented as a report in the fom. of an HTIML encoded document and is 
sent to a web browser for viewing. 

cn T^^'^' ^ of displaying CCG<lata contained in 

50 CCG phrases withm web ixigee which are displayed by a web browser executing on a digital 
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computer. While a web page is bading or has loaded in a web browser, the web browser 
parses the web page aruj cfisptays the text (or data) of the web page on a display device 
connected to the computer. When the web browser parser encounters CCG phrases, the web 
browser may display the CCG^ata (element and/or attribute names (or translations of element 
5 and/or attritHjte names) and/or values) in a number of browser specific ways. For example, the 
web browser may by default not display any CCG-data. display all CCG-data. not display any 
CCXVdata until a CCG display control attribute expfidtty states that subsequent data should be 
displayed or display aQ CCG<lata until a CCG display control attribute expOdcry states that 
sut>sequant data should not be displayed. The web browser may also use CGA display 
1 0 controls specifying the sice. font, position and so forth to alter the display of the CCG-data. 

DESCRBmON OF EXAMPLES 

Having ir)dicated ttte nature of the present inverttion, examples or embodiments Uiereof wffl 
riow be described by way of Slustratimi only. 

Example 1 : HTML Syntax Suttabte for Reoresentinq a CCG Phrase 

The foSo^g is an example of HTML element syntax suitable for representing CCG phrases \n 
which a control (e.g. *SHOW0 may be *good unta countermarxJed* arKj thus apply to more 
tt)anor)e field: 
20 <CCG HREF=-urr 

{{NAME=1aber I IO=Tdentifier_codeT &| {L/VNG=1anguage_coda- & 

CL^SS=^lass name") 

( 

(SET SEPARATOR} &| 
25 {IND£X|NOIN0EX)&| 
{SHOW I HIDE) &1 

pCPOS=*horKontaLposi(ion_numberO &| 
{YPOS=\eiticaLpos»on_number^ ^ 
{NEWUNEl&l. 
30 (AUGN=centre | teft j r^ht | justify} &| 

{SlZE=l+A]1|2|3|415|6|7}&| 
(COLOR^-^rrggbb- 1 "colour.namel &| 
{FACE='Vpe_fece_ftameT &| 

(BUNK &| BOLD &| UNDERLINE &| rTAUC &| STRIKE} &| 
35 (SUBSCRIPT | SUPERSCRIPT) &| 

(CLEAR{=leftinght|afl}} 
{NORMAL} &| 

{{{CONTACT &| COPYRIGHT &| DEVELOPER} &) 
(PERSONAL &| BUSINESS A| ASSOCIATION} &i 
40 {attribute_name='atlr&ute_yatue(s)'} 
} 



where: the ellipsis tmpOes optional rep^tion of the braced Cf 11 rtems; the braces are 
45 used to group items and are not CCG syntactic elements; (ar)d) cmpOes items must occur 
together. T (or) implies only one item must occur, and *&r .(and/or) tmpOes any including none 
of the items may appear together. 

Using the syntax of this example, each CCG phrase is represented as an HTML element, the 
50 element nan^ being "CCG' and the C(X>Klata (eg attribule_name='atiribute_value') and CCG 
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controls (sg SIZE=*1) are rBpresented as attributes of the HTML elemanL Some of the 
attributes (eg SIZE) having expficft values (eg fD and some attributea have tmpfed values 
depefKling on the presence or absence in a CCG phrase (eg when the attribute BUSINESS is 
^ present it has the implied value of True and the implied value of False when absent). 

Representation in XML syntax requires, at mosL only a simple tfanslation. AB the items, such 
as 'NORMAL* and 'attiTbuta.name' may remain unchanged as attributes of the element 
named -CCG' (eg <CCX3 ske=*\/>). However, when a CCG phrase is encoded In XML it is 
preferred that the items are represented as XML elements. For example attribute -SIZE*=»r 
10 can be represented as element •<sfee>»1<«re>* or Vsize value=+1/>* and -NORMAL* can 
be represented as '<nomaV>. 

In this example, the attributes. ID. LANG and CLASS take their meanings from KTWL 3.0. The 
•urT ffi HREF=-urr or may ba a Gnk with or without destination anchor labels. For example the 

15 URL htlp7/Www.w3.o(g/doe8.htmI does not contain a destination anchor label (or identifief) 
white hltpyA«ww.w3.orB/docsJ*nl«sear*ing does contain the destination anchor label 
Tteearching- which is intended rcJer to an anchor in docs.htm! such as <A 
NAME=-searching'>...</A>. Titam a some confusion in various HTML standards 
documentation about the dislindion between the expression NAME=-|aber and the exprBj^on 

20 ID=Tdentrfier_oode'. For most practieal purposes the two expiBssions have the same function 
or meanBig: to uniquely identify wilhin a document a position in or portion of that document 

[database control aOributes: 

•Sef.separatoi' indicates the end of association between preceding and following data other 
25 than through the weaker mutual association with the same CCG phrase or web page; the data 
are divided into sets. Index | Nolndexr indkatss that the fonowing data are / are not to be 
indexed by a web aawtef. These atii9}utes have an impGed attribute value of True' if present 
m and 'False* when absent firom a CCG phrase. 

30 Display control attributes: 

"Show I Hide* indkates that a browser should shew / not show the foOowing data. Xpos and 
Ypos indkate the posiiipn (for example h pixel or physicat units) on the browser screen where 
the data o to be displayed: -Nawine' may be used in additMn or as an allemative method of 
placing text on a browser screea 'ABgnr kuficales the positioning of data on a browser screen 

35 leiativB to the cursor. postlion set by TCpos". "Ypoa" or -Newline*. "Sire*. "Colour and "Face" 
indkates the size, cokwjr and type face or font of the following data when displayed on an 
browser screen. "Blink", "BoW. Undeifne". ttalk:'. 'Strike*. 'Superecripf and "Subsovf 
indkates that the fotowirig data shouU be displayed bSnking. boW. underiined. itafidsed. strock 
through, superscripted or subscrpted. 'Clear- indkates that the browser screen in the regwn 

40 where data wiD be dispteyed should be cleared to backgraund before displaying ttie foDowing 

data. "Nomrar indicates the data is to be displayed without ttie *Bfink- "Claar' 

characteristics. The display controls whfch consi^ of an attribute name virithout an expOca value 
have an snpfied vakje of Tnje* when present and •False* when absent 

45 CCG-data attributes: 

•Contaca &] Copyright &| Developer' ndkates that the following CCG-data refers to details for 
a person or organisation andtor to the copyright owner and/or to ttie HTML or web page 
Z!^^' *l ^""^^^ *l Association' indkates tt»at the foOowing data refers to 

details for a person andtor business and/or association. The prevteus CCG-data attributes 

.50 have an implied attribute value of Tfue' !f present in a CCG phrase or set and 'False' when 
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absent from a CCG phrase or set The attribute_name could be standard CCG attribute names 
or synonyms of starKlard CCG attribute rxames or abbreviations of CCG attribute names whtch 
refer to the following types of CCG attnbute values wttere square brackets T and f surround 
suggested attrd)ute names: 
5 • industry or service or product or subject dassifications and sub-ctassiftcations: 

• dassification name [CN]. 

• classification codes [CC]. 

• display onV text (TEXT]. 

• contact 
10 • person: 

- • courtesy tide {PNCJ, 

• first given name [PNG], 

• other given names [PNO]. 

• femilyname[PNFl. 
1 5 • name suffix (PNSj, 

• qual(ficatior\s [PQ]. 

• associations (PA], 

• contact person title [RT|. . 

• contact person rote [PR]. 
20 • organisation: 

• nai7>e(0N]. 

• unit[OLfJ, 

• identifier [OID|. 

• physical or post or delivery address: 

25 • type ^AT1(= PHYSICAL- 41 VOSTOFFICF&I -POSTAL" &|-DELI\^Rn 

• post office box number [AP^ 

• post office name [APN] 

• room or suite or offioe or una or flat or apartment name &| number [AB#]. 

• floor name &| number {ABF], 
30 • budding name lABN]. 

• lane orstreetorroad or hlgtiway number (AS#]. 

• iarie or street or road or highway narne[ASFq. 

• suburb or town or dly name (ACN]. 

• region or state or territory or province name (ARr41. 
35 • post code (APCJ. 

• country or nation name (ANhQ. 

• telephone: 

• type fTT] (= •PREFERRECT &( VOICE' L\ ^MOBILE* &| 'CAR' &| 'MESSAGE' 
ifPAGER- &| TACSIMILT i\ -MODEM- &| -ISDN" &| VIDEO") \ 

40 • nation or country code rurniberffCiq. 

• trunk access nuniber [ITS], 

• area code number (TAi^ 

• local number (TliQ. 

• email: 

45 . type (ET|(=^^f^ERNE^ I (other)). 

• mailer (EM]. 

• address (EAl, 

• Internet address: 

• ur1[lURL). 
50 • date & time: 
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• date & time from [DTF]. 

• date & time to (OTTl. 

• weekday from PTWF). 

• weekday to (DTWT]. 

5 • weekday time from [DTWFTJ. 

• weekday time to (OTWTTI. 

• time zone (DTZ). 
• brand name (BN]. 

« pubBckey: 
10 • keytypepCT]. 

• teylK). 
• geographical: 

• k>cation units [GLU], 

• location (GL). 

15 • sendcedregwn units {GLRUJ, 

• serviced region [GLR]. 

Suggested aMbiAe name [CN] fe the name of an altrfcute associated with the attrteute value 
containing -dassiriGation name* type data. For example, the [CN] attribute value could be the 

20 name of a propnetary or national or intennational or other industry classification standard such 
as the AustraBan and New Zealand Standani Industry GtessTicalion or 'ANZSIC* for short or 
the U.S. Bureau of the Census Industrial Classificalions (USBCIC). The assodated 
dassificatwn codes (CCJ attribute value couW contain the codes and/or descr^tions of the 
codes of the named standard with or without modificatksns, deletions or extensions. For 

25 exainpte; CN='ANZSlcr CC=i61;Road transport* or CN=-USBCKr CC^'SSliHardware store". 
Service dassifkalions such as the Intemafional Standard Clas^ftcatron of Occupations couW 
be used. For example: CN=1SC00* CC=*'4430-.Aucfioneer Pnxluct classifications sudi as the 
Harmonised Commodity Description And Coding System couW be used. For example: 
CN^TISC* CC='B411;Turbojets, luft}0i)ropellers & other gas turbines; parts thereof For 

30 ajbject dasstficatwns. Dewey Decimal, and/or Universal Decimal and/or Library of Congress 
andtor Bliss and/or Coton Classificaeon couW be used. For example: CN="DDC 
CX>'577.699;Sea shore ecotog/ The bdusnn of subject classifications provides a very 
simple. sfraightfon*»rard method of daasSying the subject matter of an HTWL document which 

35 ^ flttractivB to commercially oriented copyright owners. 

The text (UEXTl). pereon QPHC] - fPRD, orgarusatton QpHl - [OIDD. physical or post or 
defivery address flAT] - IAN.ND. telephone (Trn - fnjq). emafl address QET] - {EAJ) and 
Internet address [ajRL] are intended to be associated with each other in the obvious manner 
Dale & time(s} qOTF] - (DT2]) are intended to indicate the times at which the address and/or 
40 telephone and/or email win be sendced by the assodated per5on(s) and/or organisation(s). 
The brand name OBND atuflxite is intended to hold commercial brand names. PubOc key OKtl 
- (iq) is intended to hoW pubfic enaypfion keys for secure communication with Che contact 
person or organisation. 

45 The geographical tocafon (GLJ could be a latitude and longitude (eg 
El48D3ri2.S-.S36D4ff.09.6- orE148.5201.S36.6693 or -148.5201.^36.6693). or a Universbf 
Grid Reference (eg 55FV364402) or other global national, regional or local location reference 
with units as specified IGLUJ, which n typed in or obtained by pointing to a digilaDy encoded 
map or other methods. In more populated regions of some countries such as the U.S.. street 

50 addresses and post codes are associated w2h a moderately accurate geographic location and 
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can be used to interpolate geographic location data where geographic location data cs not 
expBdtly stated in the CCG^iata. Using a universaOy recognised code such as latitude and 
tongilude has advantages when used with intefnationaJ mediumB like the Internet. 
Geographical location is intended to be associated with a post de&very address or physical 
5 address such as place of business or residence. A CCG compfiant browser could use this 
reference to display a map centred on that geographic location. The purpose of the 
geographical location data is to aQow browser users to specify search engine search criteria 
which wffi result in the search engine setoctmg only those Internet accessble documents which 
provide details at)out providers wtuch are within a specified region. The sennced region (GLR] 
10 is interxled to rndtcate the preferred area of operation of provider3 expressed in terms of 
s^viced region units [GLRU]. A radial distance (eg in kDometres} or alternate means of 
expressing an area of interest around a geographic point such as polygons, are envisaged. 

It is envisaged that the COG attrdnite_value could be composed of more than one vatue 
1 5 (actually sut>Value) wherein specific characters or character strings separate tndividuat values. 

VVhile spec^ tfistanoes of element r^ames and types have been given in this example, of 
more importance is the type of data and type controls over the display and indexing of the 
data. As an attemative to the prefened immediately fo&owfftg example where the CCC^-data is 
20 lumped together under the HTML element named •QCG*. certain elements of the data, for 
example the classification data, could t>e tumped under separate HTML elements with 
distinctly different names theret>y separating CCG dasstftcation data from CCG contact data. 
However, this is not prefenred l>ec3UBe the strength of association between the two types of 
data is weakened. 

25 

Example 2: Ctas^fication of Poitjon of a Web Page. 

Where it is desired to dassify a portion of a web page, sgch as a paragraph atx>ut a product, 
simple CCG^\a may l>e used in cor^unc6pn wdh the syntax of Examplal . For example: 
<A NAME=*Radios*>AM-FM radb receivefs: </A> 
30 <CCG HREF^-iffRadios'^ 

CN=-ANZSKr 

CC=1E23.34.78',El6Ctic8i equipment - radio recerveis AM' 
CC='E23.34.79;Electrtcal equipment - radio receivers FM' 
c/CCG> 

35 We won't be t>eaten on the price of these ^h quaGty receivers .... 

In this example, the CCG prase appears after the related anchor (<A NAME=...</A>). 
However, while such proximity visuaSy provides an obvkMjs assodatiorv t>etwe^ the anchor 
and ralated CCG phrase, it is intended that CCG phrase containing the attribute HREF related 
to a specific anchor could appear anyv^iere wittun the body of a web page and remain related 

40 to the named anchor. The CCG phrase containing the attrftuite HREF coutd appear cn a 
separate document and thereby raSate the CCG-data to the entire document or to a named 
ar>chof although, as previous noted, coordinating separata documerUs can be problematic. In 
the absence of the HREF and NAME attributes, it is also int^ed that the CCG^lata apply to 
ttie wtiole web page. 

45 

Example 3 Classification of Portion of a Web Page usinQ XML Syntax 

Using XML syntax and wiiSar attrdnxta names to those of Example 2 the HTML fragment of 

Example 2 may t>e rewritten as: 

<A NAWE=*Radios">AM-FM radio receivers: <7A> 
50 <XML> 
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<CCG> 

<HREF>^Radios-<mREF> 
<CN>-ANZS!C'</CN> 

<CC>"E23.3478;E»ectrical equipment - radio receivers AM-</CC> 
5 <CC>'E23.3479:Etectrical equipment - radio receivers FM"</CC> 

</CCG> 
</XML> 

We wont be beaten on the price of these high quality receivers .... 
This exampte demonstrates that the translation of CCG-data from HTML to XML (and the 
10 reverse) involves sknpte syntactical and grammatical translations. Of course, the resulting 
HTML and XML. while \ten formed* might not be recognised or, if rBcognised; m'^ht not be 
ur)der5tood by some parsers. 

Example 4: Constructing a Web Page Containtnq CCG-data 

15 As an example, a web page developer, Aice Jamicson, is preparing an advertisement for a 
local electricbn John WBBams. trading as Kelso Electrical, who wants to advertise on the web 
for business within 30 kilomatrBS from his office located at 18 Raglan Street Kelso, Now South 
Wales. Afice uses a graphical user mteiface web page authoring tool capable of creating and 
nKxirfying web pages contafrnr^ HTWL (and XML) CCX5 phrases by accepting hputs from a 

20 user. The tool executes on a dIgKal computer having input devices such as a keyboard, 
mouse, fight pen and touch pad. display devices such as a CRT. LED arrays, liquid crystal 
arrays and computer-readable media such as magnetic and optical disks, memory arrays, 
magnetic tape and the Cke. 

25 The BUthomg tool also embodies kr>owledge of the content and structure of CCG phrases 
such as tt\e attribute names, vafid ranges and sats «f assoc^ted attributa values, the nonnal 
order of the attn'butes in the CCX5 phrase and ffiterdependenctes between attrftiute values. The 
tool provides a window where web pages may be viewed in layout (browser) mode and 
another window where the HTML code may be viewed in editing mode. The tool also provkles 

30 means of inserting, deleting, modsffng and organising HTML etements, changing font size, 
face and colour and so forth. The tool provides means for the user to butU CCG phrases by 
usffig input devices to select an ed* contiol representing various types of CCQ attriutes from 
a fist which the tool then Inserts in the body of a web page together with, when not already 
present. HTML code indteatw of the start and end of a CCG phrase. The user then types 'v\ 

35 the value in the attribute. Stmilarty. the tool provides mearts of converting web page text to 
CCG attributes. Using input devices, the user selects the text to be converted to a CCG 
attribute then selects an edit control from a fist; the tool then inserts the HTML code necessary 
to encode the text as a CCG attrftnjte. Htxwever, these semi-manual methods of creating and 
modifying CCG phrases are ir^ffident and error prone. The tool also provides a button, wtiich 

40 can be activated by using input devioas, for access to CCG phrase editing functions. The CCX3 
editing functions consist of a means of extracting the CCG values from existing CCG phrases 
in the web page being edited, forms for entering and modifying the extracted CCG vafejes, a 
layout view browser window for altering how the CCG^Jata displays (position, font size, face, 
cotour. bold, nonnal. hiding or showkig and so forth), a data view browser window to alter 

45 which CC&data values are to be indexed or not indexed gi search engine databases, and a 
means of deleting existing CCG phrases from web pages and inserting new or changed CCG 
phrases in web pag^. Edffing cursors maikmg the cunrent tocation at which text and/or data 
may be inserted, deleted or modified are provided each window and form. 
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In the current example, the web page inftiaUy contains no CCG phrase. Clicking the CCG 
editing furKtion button of the authoring tool causes a form to appear. The form contains 
prompts related to CCG attribute names and associated data input fields related to the CCG 
attribute vakjes associated with the CCG attribute names, that is CCG^ata. The fields are 
5 blank because, in the web page layout view, the edit cursor is not over a CCG phrase (and can 
not be since the web page initially contains no CCG phrase). The service dasaiftcatkms 
relevant to the web age. John Williams physical business contact address, phone and fax 
numbers, email address and geographic location and his post office business contact 
addresses are entered into the forms uskig a keyboard and mouse. The developer. ATice 

10 Jamieson, also includes her ba^ contact details where provided for on the form. The fonns 
use drc^ down lists to select address btocks (eg physical and post office) for editnig. Logk; 
associated with the forms validates tfie CCG attribute values and interdepandendps. Input 
devwes are then used to control the CCG-data layout view browser to modify the appearance 
of the CCG-data such as font see and colour and positioning. In the layout browser, input 

15 devices communicating with the edit cursor are used to highfight ^dividual items and blocks of 
items to be chariged. The post office address '» highl^hted as a block and moved «lo positwn 
in line wilh the physical address. The CCG-data view whdow is then used to check whkii data 
items are to be indexed by sean:h erigines. In this example aB CCG-data (ie an CCG attribute 
values except dteplay control values and database control values) are to be Indexed. Input 

20 devices are used to control the edit curBor to highf^ht the entire data and a mouse is used to 
dk* (activate) a button to mark aD the data for indexing. Then another button is cfidced wWch 
buOds an HML encoded CCG phrase of CCG atmhutes derived from the CCG-data values, 
display control values and database control values and oiserts the CCG phrase in the web 
page at the k>cation p<Mnted to m the web page layout browser window 

25 

The HTML code e^ftmg mode window was caOed up which revealed the foDowing HTML 
encoded CCG. phrase in the web page: 
<XML> 
<CCG> 
30 <INDEX^ 
<HIDEJ> 

<CN>ANZSICc/CN> 

<CC>D36-11-4S:EIectrical contractors • residential</CC> 
<CC>D36-1 1.46;Electrical contractors - tndustriaI</CC> 
35 <SHOW/> 

<CX)NTACT/> <COPYRIGHT/> 
<6USINESSA> 
<XPOS>50</XPOS> 
<YPOS>320<WOS> 
40 <ALIGN>centrB</AUGN> 
<SIZE>3</SIZE> 
<COLOR>blackcrcOLOR> 
<FACE>T!mes New Roman</FACE> 
<80LDA> 

45 <CLEAR>afl</CL£AR> 

<TEXT>Contact :</TEXT> 

<PNC>Mrc/PNC> 

<PNG>John<;PNG> 

<PNF>wafiams</PNF> 
50 <PQ>AIE</PQ> 
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<PA>ARUC</PA> 
<NEWLINE/> 

<PT>Managing Director</PT> 
<NEWLINE/> 

<ON>Kelso Electrical Pty. Ltd.</ON> 
<NEWLINE/> 

<NORMAL/> <rrALicr> ^ 

<Sl2E>-2</SIZE> 

<TEXT>NSW License 45678C</TEXT> 

<NEWLINE/> 

<NORMAU> <BOLD/> 

<SIZE>*2</SIZE> 

<;AT>PHYSICAL</ATV 

<AS#>18<AS#> 

<^N>Raglan Stn3et<ASN> 

<NEWLINEA> 

<ACN>KeJso<rt:AN> 

<NEWUNE/^ 

^^4>NSW<ARN> 

<NEWLINE> 

<HIDE^ 

<ANN>AustraBa</ANM> 

<NEWLINB> 

<SHO\Nf> 

<TEXT>Phone:</TEXT> 

<TT>PREFERRED ; VOfCE ; MESSAGE<nT> 

<HIDE/> 

<TC#>61<n'C> 

<SHOW/> 

<:7T#>0<nT#> 

<TA#>63<ArA#> 

<TL#>45G-7828</Tl#> 

<TEXT> fax:<fTEXT> 

<TT>FACSIM(IJE<aT> 

<HIDE/> 

<TC#>61<n"C#> 

<SHOW> 

<TT#>0<nT#> 

<TAM>63<rTAM> 

<TL#>466-7829</rL#> 

<HE\NUH£> 

<eT>INTERNET</Er> 

<EA>johnw@firefty.com,au<EA> 

<TEXT> <yTEXT^ 

<GLU>LatLor^</GLU> 

<GL>=^.3978S:148.5679E<«L> 
<GLRU>Km</GLRU> 
<GLR>30 </GLR> 
<SET.SEPARATOR/> 
<XPOS>250<OCPOS> 
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<YPos>32o<nrpos> 

<NEWUNE/> 
<NEWLINE/^ 

<TEXT>Or write lo us at ;</TEXT> 
5 <NEWLINEA> 

<ON>Kelso Electrical Pty. Ltd-</ON> 

<NEWLINEA^ 

<AT>P0STOFFIC£</AT> 

<AP#>P.O. Box 187</AP#> 
10 <HEmjN&> 

<APN>Sunny Comer</APN> 

<TEXT> <rrEXT> 

<APC>2795</APO 

<NEWUNE/> 
15 <HIDEA> 

<ANN>Austrafia<ANN> 

<SET_SEPARATOFV> 

<HIDEA> 

<DEVELOP£RA> 
20 <BUSINESS^ 

<PNG>Arice<4>NG> 

<PNF>Jamieson</PNF> 

<ET>I^frERNET</ET> 

<EA>aliiam@firefiy.coni.au</EA> 

25 <IURL>htlp:/Nww.firefly.corn.au/'aIiarn/<IURL> 
</CCG> 
<OCML> 

in the web page layout browser wiridow the CCG-data displayed as foSows: 
30 Contact : Or write to us at 

Mr John wafiams, AIE. ARUC. 
Managing OffBCtor 

Kebo Electrical Pty. Ltt. Kelso Electrical Pty Ltd 

NSW License 45678C p.o. Box 187 

35 1 8 Raglan Street Sunny Comer 2795 

Ketso 
NSW 

Phone: 063-456-7828 Fax 063-456-7829 
Ema|: johnw@firBfly.corn.au Mao 

40 

Having encoded the web page in this way, ATioe ttim posts it on the storage device of a digital 
computer connected to the Internet fnrni where it can be retrieved through the Internet usmg 
the URLlittp:/AAnrtArtr.firefly.corn.auHohnwM 

^5 Example 4: Constnjctffio a Database from Web Pages Containing CCG-data 

During a roiufine sweep of ktterrot oonneded web page servers, a web cfawier (or robot) 
operating on a sen^ named 'ccg.search.com* executkig on an Internet connected digital 
computer discovers the URL l!tl|rfA*rtAwJirefly.a)rn.auH in a document it 

had p/Bviousfy retrieved through the Intemet The web crawler decides that the URL matches 

50 its selection critena because the URL contains the suffix '.htmr. The web crawler then 
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successfully retrieves the document by extracting from the URL the address of the computer 
hosting the documenL addressing and sending a message excluding the address of the web 
oawter) requesting the web page through the networtt to the web page host computer using 
TCP/IP protocol, the host computer then reads the document addresses and sands the 
5 document to the web crawler using TCP/IP protocol, the web crawler then waiting until it has 
received all parts, of the web page from the host computer before proceeding h inspects the 
contents of the document and finds that it matches the additional selsdian ctiteria that it is an 
HTML encoded documenL The web crawler program, depending on ia state and logic then 
parses the document strips out and saves some or afl of the URLs in the document for future 
10 examination. The web oawter program then passes the document together with the URL of 
the document through a neliMOdt communications channel to an indexing program executing 
on a different computer. The indexing computer has database updating software which 
manipulates a database stored on oomputer-readable media. 

15 The indexing program parses the document from first to last character, indexing some of the 
meta data m the <head> of the document and the words in the text of the document with 
respect to the document URL In the database of this example, unique words extracted from 
the documents already inde>«d are hekJ in separate rows of a column of a database table and 
m another column of the same tabie on each row is an associated pobiter to the (irst bucket or 

20 block of URLs of docuinanla containing the word associated with the pointer.^ 

are found, the new won* is added as a new row in the word column of the table, a new bucket 
IS created, the URL of the document containing the new wort is inserted into the bucket and a 
pointer to the new bucket is written in the new row pointer column. When the same word is 
found m another document the row in the table of the word is found, the pointer is retrievBd 

25 from the table, the bucket pointed to by Oie pointer is retrieved and the URL of the other 
document is inserted in the bucftet VWiere a bucket becomes fuD of URLs, a new bucket a 
oeated and a pointer to the nnv budoBt for holding addHional URLs is placed in the full bucket 
Deletion of wonJs and URLs of changed or no tonger existing documents is also provided for. 

30 In addition to indexing words extracted from the text of the document the indexing program 
also ouJexes the CCG^late h the document as weD as indexing words found in the CCG-data 
When the parser finds HTWa. etemcnt '<m.>' in the document it switches into XML parsing 
mode and switches out of that mode vAvan '<mi> is found. When the element -<CCG>' is 
found, the parser switches into the CCG parsing mode and switches out of that mode when 

35 *</CCG> is found. 

The example database has a CCGmata attribule name to database property name 
correspondence tabte to show the relattonshtp between the CCG-data attribute names and the 
database tables and columns (propettias) whem the CCG^ata attribute valuas are to be 

40 stored m the database as database pnpsfty values. The database property values and 
assoaated URLs are stored in much the same way as for words extracted from text as 
outfined above. However. CCG contact data, for exampte. which consists of several distinct 
CCG^lata attributes which are related (eg street name. dty). is stored in a database table 
having a column (property) related to each distinct CCG contact attribute name and each 

45 separate CCG contart data set (eg person s name, address, telephone number) as separated 
by <CCG> . •<SET_SEP/VRATOR>' and '^CCO* is heM in a separate row in the table. The 
values stored m each row are consiH&ed to be a set of associated property values of different 
types. 
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The indexing progfam. during parsing the document of Example 2 above, encounters the 
"<CCG>' element and enterB the CCG parsing mode. The parser knows to ignore display 
control attnbutes and to consider database control etements ri the CCG phrase. The eicample 
indexing program opts to index afl other CCG^ata contained in the attribute values until 
5 expficH}y instructed not to hdex the attribute values by encountering the °<NOlNDEX/>'' 
dat3t>ase control element and then to recommence ndexing \A^en the *<lNDEX/>' database 
control element is enoountsrad. 

Taking each CCG-data attribute name and associated attribute v3lue(s) k\ succession, the 

10 example indexing program uses the oorre^ondehce table to translate the CC&data attrttxjte 
name to the database tabte and column (property) names where the CCG-data attribute 
value(s) are to be stored as database property vahje(3). The indexing program rhay opt to 
translate the CCG-data attnlxite values to database property values by. for. example, 
converting character strings of digits to binary encoded dectmat representation, the string 

15 *True" to a single bit repre^ntatzon &r>d the Eke. The indexing program than adds or updates 
the database property value(s). using the database tabta and column (property) names (or 
similar refererKres) obtained by translation, in much the same manrier as outfinsd abova for the 
update of the database using urords oxtracted from the document text, including associating 
the data to the document URL vyhera desired. Where the CCG^iats oontatna a *ViREP 

20 attrft>ute (or similar), the URL associated with the other CCG-data is a URL taken from the 
*HREF attribute vahie or composed of the document URL and the TIREP attribute value if 
the attribute value is a partial or relative URL Some CCG attributes, such as *<BUSlNESS/> 
have only an implied value of trua IS the attribute is pfBsent and false if the attr3>uta is absent 
the •<SET_SEPARATOR/>". •<C(Xi>" and •</CCG>' resetting such values to false. However. 

25 where attribute value(s) associated vtnth different attrft>ute names are 8t9 related, such as a 
person's name and a street name, the related values of different types are stored on the same 
row of the same database table but oi a different column (database property) to preserve the 
relationship. ■<SET_SEPARATORA>" SmBs the degree of relatedneea between, for example, a 
person's name occuning before the separator and a street name occurring after the separator. 

30 Using the example document and using the same database cokimn (property) names as used 
for the CCG-data attribute nsxn&B q portion of the table constmcted datat>ase table wouki took 
nke: 
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PT 




URL 
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John 




AIE 


ARUC 


Atoaging Director 




(ponler) 





















35 Difficunies not tt^hBghted by this example are the need to handle properties having multiple 
values of the same type, "Sparse roufS*" v^here only a few values are not nua (blank) and tables 
with extremefy large numbers of rows. For exampCe, the CCG-data of this example couU have 
contained muftipte values of persond quafificafions (TQ'). To represent this type of data using 
a 2 dimensfonal table database system, the database wouM be 'normalised" so that the 

40 multiple vahies were stored en a separate table and keys or pointers were used to relate the 
relate the items in the two tables. Numerous alternate database systems, for example those 
t>ased on key hashffig and data bucte^ or tagging data values with prefixes or suffiites 
related to the type of data vahie t?my be used. Preferat}fy. however, whatever database 
system is uesd, it should praservo Cho cssodafions of CCG^ata items present in the CCG 

45 phrases. 
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Because the geographic location data was missing from the postal address of the CCG-dala in 
the example doajmcnt, but a post code was present, the indexing program inferred the 
geographic location from (he post code. 

5 

Example 6: Finding Web Page References Using a CCG Database 

As an example, Kevin Robson lives in Sydney but owns and has rented out a house in 
Bathurst He wants to use the web to find some etectridans based in the general Bathurst 
rBgion (no! only in Bathurst City) to contact for estlmatffig the cost of modifying the ^M^nQ in the 
10 house. He uses hts web browser to open the web page 
*http://www.au8&nd.com.auAtfeb_search.htmr containing AusUne's search engirie web page 
search criteria input form encoded using the HTML •<fonn>' element 

The search criteria input form contains several mpiA fieWs including those labeRed 'Senrtce 
15 dassification', 'Key words*. XityTSubuib/Town*. -Country*. Xat/Long" and -Radius". The fwm 
also displays a button labelled *Map' to aQow latitude and longitude to be selected by pointing 
to map images. The vford •eledridan* is typed into the 'Setwe classification- field, 'house 
v«ring* Ho the •Keywords' field, •BathursT bWo the Xfty/Suburbn"own* field and MCT into the 
field -Radius'. The country •Australia* was already shcimg in the country field because the 
20 web page server had received cookie data from the browser Indicating that that was the 
country used when the browser last used the web page. The "submit search* button on the 
web page was cficked. The browser transmitted a message using TCP/IP protocol to the 
AusUne senw containing the Input field values encoded in the header of the message. 

25 After a short delay, the search result HTML encoded web page was returned. Cfiddng on the 
•Service dassification* input field drop down fist box to check the classifications used in the 
search revealed three items: 

« Electrical contractors - rasidentsal 

• Electrical contractors -indu^rial 
30 • Electrical engineers 

The search engine attached to the server obtained those dassificationB by using word 
stemming and searchmg the text of the service classifications heki in ifs database. The 
l-at/Long fiefd contained the value '33.38S6S;i48.5743E' which the search engine obtahed 
by looking up the latrlude and longftude of the town "Bathufsf in the country 'AustraHa* in ifs 
35 daUbase. Clidcmg on the 'Vap' button retrieved a web page having the image of a map 
centred on the town of Bathurst and showing the area 20 Km around cL The search enghe 
obtained the map by makir^ a request to another Internet connected server and supplyffig the 
latitude, tongrtude and radius. CficMrq on the browser •BadT button returned to the search 
results page. 

40 

The search resufts contained 8 tflles. bri^ desorqytions and URLs including a reference 
contaming the URL "httpr/AMAw.finBlly.con^au^johnw/M Retrieving each in turn 

revealed that aB were weD focused acconlirig to the search craena being related to electricians, 
electrical contractors and engineers in the BathiHst area. The search engine obtained these 
45 referef)oes to web pages by: 

• searchir^g ifs database of service dassification tittes with words stemming fifom 
'etectrkaan' which resulted in three service ctass3ication codes. 

• searchmg ifs database using the three sendee dassification codes to obtain an 
tntermediate list of URU of vwb pages conlaming those CCG codes 
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• searching iTs database (or the two keywords to obtain an intermediate list of URLs of 
web pages containing those words in the web page text. 

. Searching ifs database to find the latitude and tongitude of Bathurst.;^^ 

• searchir^ ifs database to obtain an intemiediate list of web pages which contain 
5 latitude and tongitude data lying within 10 Knn of the latitude and longitude of 

Bathurst, Austra&a. 

• producing as a result Bst. a Ost of URLs which are common to aH the intermediate lists. 

• ot)taining from iTs dabbase the titie and brief description of the web pages, 

• fonmatting the dttes. descriptiona and URL^ into an HTML encoded report 
10 • transm'ittirtg the report to the enquiring web browser. 

Example 7: Finding Contact Detafls Using a CCG Database 

As an example. Jim Jones of Jonas and Sons wants to send a recall notice about a faulty 
batch of UV stabilised electrical power cable to aS Electrical contractors and Etedrical 
1 5 wholesalers in AustraFia who have email addresses. He uses his web browser to open the web 
page •http://www.ausnne.comju/contactj5earch.htmr containing AusLine's search engine 
contact search criteria input fonii encoded uar^ the HTML •<fonn>'' element 

The search criteria input form corttalns several input fields induding those labelled "Sewice 
20 das6ificatjon\ "Country* and •Output ftrnnaf. The word 'elactric- is typed into the •Senrice 
dassrfication* field, the word *Austra6a' is typed into the "Country* field and the "Tabular - 
Name & Emar option in the "Output formaf drop down list box is selected. The "Submit 
search* button on the web page is dieted. The browser transmits a message using TCP/IP 
protocol to the AusLme sen^ containing the input fieid values encoded in the header of the 
25 message. 

After a short delay, the search result HTML encoded web page is returned. CUcking on the 
"Service classification'' input field drop down list box to check the dassrficattons used in the 
search revealed too many dassrfications for the result to be sufftdentty focused. The following 
30 four classifications were seleded from the Est 

• Electric cable - ductfftg systems 

• Electrical contractors - residentia] 

• Electrical contractors - industrial 

• Elec6ical wholesalers 

35 and the 'Submit search* button is pressed again to refine the search. 

The search results contained 3.473 names and assodated email addresses and URLs to fuD 
contact detaSs. Jim saved the search result page on his computer so that he couM use his 
ernaa program to send the recafi notioe to each emai address b the Gst. The emaO address 
40 "iohnw@firef)y:com.au" was included in the fist 

The search engine obtained these references to web pages by: 

• searching ifs datat>ase using the four service dassSication tides which resulted in four 
service dassrfication codes. 

45 • searching Its database using the four service das^ication codes to obtain an 
intermediate Qst of datat>ase primary keys of database table rows containing those 
sen/ice classification codes h the databa$9^ Sennce dassrfication attnl)Ute. 

• searching tTs database using the country name "Australia" to obtain an inteonediate 
list of database primary keys of datat>ase table rows oontatntng that word m the 

50 database Country attribute. 
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• producing as a result ist a fist of database primary keys which are common to both 
the inlemiediate fists. 

• obtaining from ifs database using the result fist the values of the name and email 
attributes. 

5 • using the HTML <table> elenr>ent to forniat the name values, email values and full 
detafl URLs into an HTML encoded repoa 

• transmitting the report to tt>e enquiring web browser. 

This example relates to fimling sets of associated database contact values without requiring 
10 references to web pages. However, finding other sets of associated database values such as 
sets of associated industry das^tcation values and geographic bcation values might also be 
useful for some purposes. 

Thus it is appreciated that the afore stated goals, advantages and objectives are achieved by 
15 the teachmgs herein. In particular it is seen that, unGke the prior art, efftcientty searchable 
Yellow pages and White pages databases and the Eke may be automaticaOy constnjcted from 
HTML encoded web pages. Additk>nafiy the database entries may be automatically linked to 
specific web pages and portions of web pages aOowtng convenient methods of indexing of 
product and service catalogues and the E3ce. It is abo appreciated that simpler methods of 
20 constructing databases suited to a variety of other uses such as MuGtry and subject 
directories are also provkled. 

From the foregoing teachings and with the knowtedga of those skiled in the art. rt is apparent 
that other modifications and adaptations of the invention t)ecome apparent For example. 
25 the method steps dtsdosed and daimed herein may be practiced in a variety of different 
orders. CCG^ata may take on a vari^ of different forms wfthin the meaning of the daims. 
Thus, ft is our fritentun to kidude wilhtn the scope of the daims not only the invention fiteraHy 
embraced by the language of the daims but to tndude aH such modifxations and adaptatwns 
which may come to those skilled in the art 
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What I daim cs: 

1. Ah HTML encoded web page embodied on a computer-readable medium, said web 
page comprising at least one HTML encoded CCG phrase, each CCG phrase 
5 comprising: 

a) HTWL code hdicative of the start of a CCG phrase. 

b) at least one CCG-data attribute, and 

c) HTML code tndicath/e of the end of a CCG phrase. 

10 2. An HTML encoded web page embodied on a computer-readabte medium, said web 
page comprising at least one HTML encoded CCG phrase, each CCG phrase 
compriair>g: 

a) HTML code indicative of the start of a CCG phrase, 

b) atteasttwoCCG^ataattrftnites. 

15 c) at least one database control attribute separating said CCG-data attributes into at 
least two sets of CCG attnlxites. and 

d) HTML code indicative of the ervl of a CCG phrase. 

3. An HTML encoded web page embodied on a computer-readable medkim. said web 
20 page comprising at least one HTML encoded CCG phrase, each CCG phrase 

comprising: 

a) HTML code indicative of the start of a CCG phrase. 

b) at least one CCG^lata atMutes, 

c) at least one attrft>ute of. database control attributes, display control attributes; and 
25 d) HTML code Micative of (he end of a CCG phrase. 

4. A computer implemented method of building a web page comprising at least one HTML 
encoded CCG phrase, the method comprishg the steps of: 

a) displaying a web page on a computer display device. 
30 b) displaying an edK cursor indicating a character posftk)n on said display device and 
a oonesponding character position in said web page, said edit cursor being 
pbsittonabfe witMi the display of said web page by use of computer input devices, 
c) separately displaying on said computer display device a set of edit controls 
representing CC&data attrSHite types. 
35 d) posttionhg said edit cursor within said deplay of said web page using said input 
devices. 

e) selecting an edit control firom said s^ of edl controls using said input devices. 

f) relating said selected edU control to a corresponding CC&data attribute name. 

g) constmcting a CCG-data attribute character string comprising a chanactsr siring 
40 representing said attn'bute name and another character string representing an 

empty CCGHtata value, 

h) if the said edit cursor is positioned outside a CCG phrase. 

0 inserting Into said web page, at the character posftion indicated by said edit 
cursor, a start character string compri^ HTML code indicative of the start 
45 of a CCG phrase. 

v) mserfing into said %veb page, immediately after the end of said start 
character string, an end character string comprising HTML code indicative of 
the end of a COG phrase, and 

in) postoning said ed9 cursor between said start and end character strings. 
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0 inserting said CCGHdala attribute character string into said web page at the 

character position indicated by said edit cureor. 
j) positioning said edit oirsor at the character position in said web page of the CCG- 

data vatue of said hserted CCG^ata attnljute character string, 
k) inputting characteiB using a keyboard. 

0 inserting sakl input charactefs into said web page at the character position 
indicated by sakl edd cursor, thereby converting said empty CCG^aia value to a 
norv«n»pty CX:G-data vahj©. and 

m) writing saki web page on computer-ceadable media. 

A computer implemented method of buiWing a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of: 

a) displaying a web page on a computer display device. 

b) displaying a start edft cursor and an end edit cursor^ on sakJ display device, each 
sakj edit cursors kxlkiattng a character position on sakJ display dovks and a 
corresponding character posrtton '« sakJ web page, saW edit cursors being 
positkjnable wtihin the display of sakl web page by use of computer input devk»s 

c) separately displaying on saM computer display device a set of edit controls 
representing CCG-data attribute types. 

d) selecting a string of web page characters on saki display device usirSg said input 
devices to positk>n sakl start edit cursor to kidicate the start sakj string of web 
page characters and saki end edit cursor to indkate the end of saki strvig of web 
page characters. 

e) selecting an edt control from saki set of edit controls using saki input devices, 

0 relating saki selected CCG-data control to a corresponding CCG^ata attribute 
name. 

g) constructing a CCGndata attiftMite character string comprishg a character string 
reprasenting said attid)ute name and another character siring representing a CCG- 
data value containing saki string of web page characters. 

h) deleting said string of web page characters from saui wen page. 
0 if the saki start e<a cursor h posSoned outskie a CCG phrase. 

0 insertirig into sail web page, at the character positwn indkated by said start 

edit cursor, a start character string comprising HTML code indkative of the 

start of a CCG phrase, 
fi) inserting into sakl web page. Jmmediately after the end of saki start 

character string, an end character dring comprismg HTML code indicative of 

the end of a ceo phrase, and 
iiO positioning said start edft cursor between sakl start and end character 

strings. 

j) inserting saui CCG-data attrftute character string hto sakl web page at the 
character positbn mdkated by saki stait edit cursor, thereby converting saki string 
of web page chanadersto a CCfrdata attribute value contained within a CCG- 
data attribute contained iwthin CCG-phrase. and 

k) writing said web page on computer-^dabte media. 

A computer tmpiement^ method of building a web page comprising at least one HTML 
encoded CCG phrase, the m^od compisirig the steps of: 

a) displaying a CCG-data Input form on a computer display devka. 

b) inputfyyQ CCGndata values Into fiekls of sakl data input fonn using computer input 
devices. 
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c) inserting into the body of a web page a start character string comprising HTML 
code indicative of the start of a CCG phrase. 

d) inserting into said web page body immedtalely after the end of saKj start character 
string an end diaracter string comprising HTML code indicative of the end of a 

CCG phrase, ^ ,^ ^ 

e) extracting successive field values from said data entry form together with related 
field value type infomiation. 

f) relating the type of each extracted field value to a conesponding CCG<Jata 
attribute name, 

g) constructing a CCG-data attribute character string comprising a character stnng 
representing said attribute name and another character string representiog said 
field vatua. ; 

h) tfisertmg said CCG-data atln"bute character string into said web page between saia 
start and end charact^ strings. 

i) writing said web page on computer-readable media. 

A computer implemented method of buiWing a dataliase which comprises sets of 
associated property values wherein each set mdudes at least two property values of 
different types, the property values being any of classification values, contact values, 
geographic location values, hereaiafter coBedively referred to as CCG-data. the method 
comprising the steps off: 

a) retrieving successive web pages from a computer nelworic, each web page bemg 
identified by a URU ^ ^.^ ^ 

b) searching each web page for a CCG phrase that indudea a plurality of difTerent 
types of CC&data attr&utes. 

c) extracting a plurafityofsaid attributes from said phrase, 

d) from each extracted attraujte. deriving an attribute name and a related attribute 

value ^ 

e) determining the type of said extracted attribute and said attribute vahie by 

reference to said attributa name. 

f) relating said type of atlrteute value so detennined to a corresponding type ol 
database property value, 

g) relating the URL of sakJ web page to an other type of database property value. 

h) writing said derived attribute value to the database property value of said 
deteontned conesponding type in a eat of aasodated property values, and 

i) 'writing the URL of said web page to a database property value of said other type 

in said set of associated property vakies. 

A computer frnplemented method of building a database which comprises sets of 
associated property values wherein each set includes at least two property values of 
different types, the property values being any of dassificatbn vahies. contact v^es. 
geographic location values, hereinafter cofladively refened to as CCG^ata, the method 
comprising the steps of. 

8) retrieving suocesslvo web pages from a computer networit. each web page being 
identified by a URL. 

b) searching each vireb page for a CCG phrase that includes at least one type of 
CCG-data attribute. 

c) extracting at least one said attrft)Ute from said phrase. 

d) from each extracted attrftHite. deriving an attrftjute name and a related attribute 
value. 
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e) detenmining the type of said extracted att/ibuta and said attribute value by 

reference to said attritnite name. 
0 relating said type of attribute value so determined to a corrBsponding type of 

database property value, 

g) relating the URL of said web page to an other type of database property value. 

h) writing said derived attn'bute vahie to the database prtsperty value of said 
determined corresportding type in a sat of assoctated property values, and 

i) writir^g the URL of said web page to a database property value of said other type 
in said set of associated property values. 

A computer implemented method of building a database which comprises sets of 
associated property values wtmein each set includes at least two property values of 
different types, the property values being any of classification values, cqntdct values, 
geographic location values, hereinafter collectively referred to as CCG-data. the method 
comprising the steps of: 

a) retrieving successive web pages from a computer network. 

b) searching each web page for a CCG phrase that includes a plurafity of different 
types of CCG-data attributes. 

c) e)dracttf)g a ptitfaity of said attributes from said phrase, 

d) from each extracted atlr&ute, deriving an attrit>ute name and a related attnlHite 
value. 

e) determining the type of said extracted attribute and said attribute value by 
reference to said attrdnjlB name, 

0 relating said type of attribute value so determined to a corresponding type of 

database property value, and 
g) writing said derived attrftxite vahie to the database property value of said 

detemihed conBsponding type in a set of assodated property values. 

A computer implemented method of finding references to wab pages posted on 
computer network the m^hod using a database comprising sets of associated property 
values, the propeity values being any of classification values, contact values, geographic 
k>cafzon values, herecnafler colacfively rafarred to as CCG-data. and URL references, 
the method comprismg the etapa c/t 

a) receiving a query phrase indtJdlng query relational expressions from a computer 
network. 

b) parsing said query phrase and extracting each of said quary relatior\al exprBssions 
cnduded therein, 

c) from each extracted queiy relational expression, deriving a query fiekl ir^me. 

d) detenmining the type of said query relational expression by reference to rts derived 
query field name. 

e) rotating said type of query relational expres^n so determined to one of the 
following query relational expression types: CCG-data type, other type. 

0 provided said query ralaSona) expression is a CCG^ata type, deriving a query 
relational operator and query value related to its query fieU name from said query 
relational expression. 

g) detemtirmg the type of said query value by reference to said query field name^ 

h) r^tkig said type of query value so determined to a conreepondtng type of 
database prop^ty vahie. 
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i) locating database propefty values of said determined corresponding type which 
return a tme value when tested against said query value using said query 
relational operator. 

i) extracting from said database a fist of the URL references associated with the so 
located database property values. 

A computer imptemerited method of finding sets of associated database property values 
the method using a database comprising sets of associated property values wherein 
each set indudes at (east property values of deferent types, the property values 
being any of classification values, contact values, geographic vahies. hereinafter 
coOedively refened to as CCG*data. the method composing the steps of: 

a) receiving a query phrase including query relational expressions from a ^mputer 
network. 

b) parstrig said query phrase and extracting each of said query relational expressions 
indudad therein^ 

c) from each exbaded query relational expression, deriving a query fieJd name, 

d) determining the type of said query relational expression Ijy reference to its derived 
query field name. 

e) relating said type of query relational expression so determined to one of the 
following query relational expression types: CCG^ata type, other type. 

0 provided said query relaUor\al expression is a CCG-data type, deriving a query 
relational operator and query value related to its query field name from sard query 
relational expiesston. 

g) determining the type of said query value by reference to said query field name. 

h) relatir^ said type of query value so determined to a corresponding type of 
database property value. 

i) locating database property values of said determined corresponding type which 
return a true value when tested against sard query value using said query 
relational operator. 

j) extracting from said datat>a5e sets of assodated database property values 
associated with the so located database property values. 

A method of displaying a web page comprising at least one HTML encoded CCG 
phrase, the method comprising the steps of: 

a) retrievmg a web page from a computer network. 

b) parsing said retrieved web page to k)cate an HTML code indicative of the start of a 
CCGphmse. 

c) parsing saM k>cated CCQ phrase and extracting successive CCG attributes 
contained therein unt9 an HTML code indicative of the end of said CCG phrase is 
found. 

d) from each extracted attnliute, deriving an attribute name. 

e) detemiining the tj^ of said extracted attribute by reference to its derived attribute 
name. 

0 relating said type of atlrduJte so determined to one of the following attribute types: 
database control, di^lay control. CCG-data. 

g) provkied said extracted attrdiute is not a database control type, deriving an 
attrft)ute value related to fts dttrftHJta narr^ from said extracted attrOxite. 

h) determining the type of saM attrfcute value by reference to said attribute name. 

i) relating sakf type of attribute value so determined to a corresponding type of 
parameter of a dtsplay-devk»<ontroH)rogram. 
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j) writing said attr*ute value lo said parBmeter. and 

k) where said type of attribute is a CCG-data type, causing said dtsplay-device- 
contio(-program to effect display of said. attribute value on a display device, 
formatted and positioned according said display-<Jevice-contro>-program 
5 parameters wftereby successive values of CCG-data of the CCG phrase are 

displayed. 
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ABSTRACT 

A system for automaticaDy aealing databases containing industry, service, product and 
subjert cJassification data, contact data, geographic location data (CCG-data) and Onks to web 
pages from HTML. XML or SGML encoded web pages posted on computer networks such as 
5 the Internet or Intranets. The web pages containing HTML. XML or SGML encoded CCG-<jata. 
database update controls and web browser display controls are aealed and modified by using 
simple te)Ct editors. HTML. XML or SGML editors or purpose built editors. The CCG databases 
may be searched for references (URLs) to web pages by use of enqusies which reference one 
or more of the items of the CCG-data. Alternatively, enquiries referencng the CCG-data in the 
10 databases may suppfy contact data wdhout web page references. Data dup&cation and 
coonfnation is reduced by mduding in the web page CCG-data display controls Which are 
used by web browsers to format for display the same data that is used tc automaticaUy update 
the databases. 
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